Background

An enormous benefit of remote sensing is the capacity to discern different materials based on their spectral reflectance, allowing for the determination of land cover types across large areas. This categorization of data is fundamental for comprehending the distribution and evolution of land cover types, providing valuable insight regarding the impacts of changing environmental conditions under climate change.

While there are multiple approaches to performing land cover classification, this investigation will focus on a supervised approach, specifically a decision tree classifier to assess the land cover in southern Santa Barbara County. A decision tree classifier classifies pixels within the multi-spectral image involves using a training data set, where each pixel in the training set has data on it’s land cover classification and it’s spectral properties. Using this training data, conditions are developed regarding the attributes required for characterization of various land cover types, thus when a applied to a new spectral image, the trained decision tree is able to identify land cover type for the individual pixels.1

As residents of Santa Barbara, California, the purpose of this spectral analysis is to determine the land cover types for southern Santa Barbara County using multi-spectral imagery and location data for the following landcover types :

credit: this lab is based on a materials developed by Chris Kibler.

Data

Landsat Thematic Mapper

In order to train our decision tree, the training data is a scene from Landsat 5 collected on September 25, 2007. This scene is a collection 2 surface reflactnace product and contains spectral-data of bands 1, 2, 3, 4, 5, and 7.

Study Area and Training Data

The study area of interested is defined by a polygon representing southern Santa Barbara County. The training data file for southern Santa Barbara county contains polygons representing a singular land cover type out of our 4 land cover types of interest: green vegetation, dry grass or soil, urban, and water.

Workflow

library(sf)
library(terra)
library(here)
library(dplyr)
library(rpart)
library(rpart.plot)
library(tmap)

rm(list = ls())

here::i_am("SBLandCover.Rmd")
setwd(here())

Manipulating landsat data

In order to assess the land cover type across southern Santa Barbara county, we must first train a decision tree with training data regarding the spectral-reflectance information characteristic of each landcover type. For this investigation the training data is a scene from Landsat 5 on September 25, 2007. This will be achieved by :

  • loading in the landsat data for each spectral band and creating a singular raster stack
  • cropping landsat data to southern Santa Barbara county, which is the region we have training data for
  • convert values in lansdat raster to correspond with reflectance values
    • conversion of values required removing erraneous values and applying scaling factors
    • as this scene is a Landsat Collection 2 surface reflectance, the scale factor of 0.0000275 with an additional offset of -0.2 per pixel, and the valid pixel values range from 7,273 to 43,636.2
    • using this information, erroneous value will be reclassified as NA and all valid pixels will be updated based on the scaling factor
#loading in landsat data -----------

filelist <- list.files("./data/landsat-data/", full.names = TRUE) #creating a list containing the names all the six spectral band files

landsat <- rast(filelist) #storing bands as a singular raster stack

names(landsat) <- c("blue", "green", "red", "NIR", "SWIR1", "SWIR2") #updating layer names to match the spectral band data they contain

plotRGB(landsat, r = 3, g = 2, b = 1, stretch = "lin") # plot true color image, telling it what bands we would like to put in each color

#loading in study area data -----------

SB_county_south <- st_read("./data/SB_county_south.shp", quiet = TRUE) #reading in the shapefile for southern SB county 

SB_county_south <- st_transform(SB_county_south, crs = st_crs(landsat)) #reprojecting study aea polygon to match the CRS of the landsat data


#cropping landsat data to study site ----------

landsat_cropped <- crop(landsat, SB_county_south) #cropping landsat scene to match the extent of the study site

landsat_mask <- mask(landsat_cropped, SB_county_south) #masking the landsat raster to the sourthern region of Santa Barbara county

rm(landsat, landsat_cropped, SB_county_south) # remove unnecessary object from environment so that R does not need to maintain it in its memory 

plotRGB(landsat_mask, r = 3, g = 2, blue = 1, stretch = "lin") #plotting to ensure that we have masked to our study site, which contains our training data

#convesting landsat values to reflectance -----------

rcl <- matrix(c(-Inf, 7273, NA, 
         43636, Inf, NA),
       ncol = 3, byrow = TRUE) #creating a reclassification matrix defining the valid pixel range, assigning all pixels outside of this range to be NA

landsat <- classify(landsat_mask, rcl = rcl)  #applying reclassification matrix to landat raster

landsat <- (landsat *  0.0000275 - 0.2)*100 #adjusting values based on reflectance scaling factor to get reflectance applying, and multiplying by 100 to get percent reflectance

summary(landsat) #viewing summary to ensure that all values are between 0 - 100, indicated that adjustments were successfully applied
##       blue           green            red             NIR       
##  Min.   : 1.11   Min.   : 0.74   Min.   : 0.00   Min.   : 0.23  
##  1st Qu.: 2.49   1st Qu.: 2.17   1st Qu.: 1.08   1st Qu.: 0.75  
##  Median : 3.06   Median : 4.59   Median : 4.45   Median :14.39  
##  Mean   : 3.83   Mean   : 5.02   Mean   : 4.92   Mean   :11.52  
##  3rd Qu.: 4.63   3rd Qu.: 6.76   3rd Qu.: 7.40   3rd Qu.:19.34  
##  Max.   :39.42   Max.   :53.32   Max.   :56.68   Max.   :57.08  
##  NA's   :39856   NA's   :39855   NA's   :39855   NA's   :39856  
##      SWIR1           SWIR2      
##  Min.   : 0.10   Min.   : 0.20  
##  1st Qu.: 0.41   1st Qu.: 0.60  
##  Median :13.43   Median : 8.15  
##  Mean   :11.88   Mean   : 8.52  
##  3rd Qu.:18.70   3rd Qu.:13.07  
##  Max.   :49.13   Max.   :48.07  
##  NA's   :42892   NA's   :46809

Classify Image

Now that our landsat raster is updated to be a single raster stack containing updated reflectance values for pixels that fall in the allotted range, we must now characterize the percent reflectance values as specific land cover types. This is achieved by:

  • loading in training data shapefile containing polygons characterizing land cover types
  • extracting reflectance values from the landsat raster at each of our study polygons
  • joining extracted reflectance values dataframe to training data dataframe, thereby creating a data frame relating different land cover classification to the associated spectral reflectance values
#loading in training data -----------

training_data <- st_read("data/trainingdata.shp", quiet = TRUE) %>%  
  st_transform(., crs= st_crs(landsat)) #reading in training data shape file and reprojecting it to match the crs of the landsat raster

training_data_attributes <- training_data %>%
  st_drop_geometry() #converting this training data into a data frame

#extracting reflectance values for the training data -----------

training_data_values <- terra::extract(landsat, training_data, df = TRUE) #extracting landsat reflectance values at each of the training site polygons 

#joining training data and extracted reflectance dataframe ------------

SB_training_data <- left_join(training_data_values, training_data_attributes, 
          by = c("ID" = "id")) %>%
  mutate(type = as.factor(type)) #joining the training data attribute dataframe to the dataframe containing extracted reflectance values, and converting landcover type to a factor 

Training Decision Tree

Once we have created a training dataframe containing land cover types and their associated percent spectral reflectance values, we can use this dataframe to train the decision tree. In order to train a decision tree, a model formula must be established, indicating what our response and predictor variables are. In this case, our reponse is the land cover type, and our predicator variables are the different spectral band values. Once we have established a model formula, we can use an rpart function, defining the model formula, dataset, and method, to perform a classification.

#training decision tree ------------

SB_formula <- type ~ red + green + blue + NIR + SWIR1 + SWIR2 #defining model formula

SB_decision_tree <- rpart(formula = SB_formula,
      data = SB_training_data,
      method = "class",
      na.action = na.omit) #training decision tree using rpart package, setting the method to classification and telling it to omit any pixels valued NA from our analysis

prp(SB_decision_tree) #plotting decision tree to ensure training was successful

Applying Decision Tree

The trained decision tree can now be applying to our entire image of southern Santa Barbara county, extending beyond the region outlined in our training sites, to classify land cover types throughout southern SB county. To apply the decision tree, the predict function within the terra package will be utilized. The output of this classification will be a raster layer with integer value, where each value corresponds to the factor level in the training data, representing a specific land cover classification.

#applying decision tree to entire image -----------

SB_classification <- predict(landsat, SB_decision_tree, type = "class", na.rm = TRUE) #telling predict function to predict our landsat scene data based on the trained decision tree, and telling the function to perdict by classification

levels(SB_classification) #insepcting levels of classified output of decision tree to investigate which integer values correspond with which land cover type
## [[1]]
##   value            class
## 1     1 green_vegetation
## 2     2  soil_dead_grass
## 3     3            urban
## 4     4            water

Visualizing decision tree output

Upon applying the decision tree to the original landsat image, a raster is outputted containing the land cover classifications for all of southern Santa Barbara county. We can now plot this raster create a land cover map depicting the land cover classifications across southern SB!

#plotting decision tree output -----------

tm_shape(SB_classification)+ #mapping land cover classification across southern Santa Barbara county
    tm_raster(col.scale = tm_scale_categorical(values = c("#8DB580", "#F2DDA4", "#7E8987", "#6A8EAE")), #changing color palette to logicallt match landcover type
            col.legend = tm_legend(title = "Landcover Type"))+
  tm_layout(title = "Land Cover Classification in Southern Santa Barbara County", title.position = c("right", "bottom"))


  1. Global Land Analysis and Discovery. (n.d.). Land Cover Classification. https://glad.umd.edu/ard/land-cover-classification↩︎

  2. United States Geological Survey. (n.d.). How do I use a scale factor with landsat level-2 science products?. United States Geological Survey | science for a changing world. https://www.usgs.gov/faqs/how-do-i-use-a-scale-factor-landsat-level-2-science-products#:~:text=Landsat%20Collection%202%20surface%20reflectance,offset%20of%20%2D0.2%20per%20pixel.↩︎